Search CORE

175 research outputs found

Role of intangible assets and their assessment in the modern economy of Ukraine

Author: Kurgan L. P.
Publication venue
Publication date: 01/01/2019
Field of study

Possession of intangible assets is an important competitive advantage, which is not always taken into account by the Board of Directors and management of the company when developing its development strategy

Tavria State Agrotechnological University

Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

Author: A Anand
A Fiser
A Murzin
C Anfinsen
C Chen
CLJ Webber
DT Jones
F Birzele
HB Shen
HJ Jeffrey
HN Lin
I Bahar
J Qi
Jian-Yi Yang
JP Eckmann
JP Zbilut
JY Yang
K Chen
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KD Kedarisetti
L Kurgan
L Kurgan
LA Kurgan
M Duan
M Levitt
RO Duda
S Costantini
SF Altschul
TL Zhang
Xin Chen
Z Aydin
ZD Zhang
ZG Yu
Zhen-Ling Peng
ZX Wang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Prediction of protein structural classes (<it>α</it>, <it>β</it>, <it>α </it>+ <it>β </it>and <it>α</it>/<it>β</it>) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%. Results We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the <it>chaos game representation </it>is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using <it>recurrence quantification analysis</it>, <it>K-string based information entropy </it>and <it>segment-based analysis</it>. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at <url>http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/</url>. Conclusion The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of <it>α </it>helices and <it>β </it>strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DR-NTU (Digital Repository of NTU)

Accurate Prediction of Protein Structural Class

Author: AG Murzin
CA Orengo
CB Anfinsen
G Deleage
H Nakashima
HB Shen
I Bahar
JY Yang
JY Yang
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KD Kedarisetti
KD Pruitt
L Dong
L Kurgan
L Kurgan
L Kurgan
Meng Ge
MJ Mizianty
P Baldi
RY Luo
S Costantini
S Costantini
SE Brenner
SF Altschul
SM Muska
T Liu
T Liu
TG Liu
Vladimir N. Uversky
W Li
WS Bu
X Xiao
X Xiao
Xia-Yu Xia
Xian-Ming Pan
XM Pan
Y Cai
YD Cai
YD Cai
ZC Li
Zhi-Xin Wang
ZX Wang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

Author: A Anand
A Andreeva
A Elofsson
A Krogh
A Paiardini
A Reinhardt
AG Murzin
AY Istomin
B Niu
B Rost
B Rost
C Chen
C Chen
C Orengo
C Zheng
CA Floudas
D Aha
D Jones
D Jones
D Przybylski
EP Carpenter
F Gu
G John
G von Heijne
GP Zhou
H Bigelow
H He
H Kim
H Liu
H Zhang
HM Berman
I Majumdar
I Witten
IB Kuznetsov
J Ruan
J Song
JM Bujnicki
JY Yang
K Bryson
K Chen
K Ginalski
K Kedarisetti
K Kedarisetti
K Tomii
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KY Feng
L Carlacci
L Dong
L Homaeian
L Jin
LA Kurgan
LA Kurgan
LA Kurgan
LA Kurgan
LA Kurgan
LT Huang
Lukasz Kurgan
M Punta
M Punta
M Robnik-Sikonja
MA Hall
Marcin J Mizianty
MM Gromiha
MM Gromiha
MM Gromiha
O Gotoh
OV Galzitskaya
P Baldi
P Langley
P Raman
QS Du
R Apweiler
R Gupta
R Kohavi
RL Dunbrack
RL Marsden
S Brenner
S Cessie
S Costantini
S Costantini
S Jahandideh
S Jahandideh
S Keerthi
S Lee
S Wu
SF Altschul
SR Amirova
T Liu
TF Smith
TL Zhang
TL Zhang
W Chen
X Xiao
X Xiao
X Xiao
X Zheng
Y Cai
Y Cai
Y Cai
Y Cai
Y Cao
Y Zhang
YD Cai
YK Yu
YS Ding
YS Ding
Z Xiang
Z Zhang
ZC Li
ZC Li
ZX Wang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at <url>http://biomine.ece.ualberta.ca/MODAS/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Exploiting the bin-class histograms for feature selection on discrete data

Author: A Ferreira
B Franay
G Brown
G Strang
I Witten
L Kurgan
N Srebro
R Duda
S Garcia
T Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio

Repositório Científico do Instituto Politécnico de Lisboa

Crossref

Integration of decision support systems to improve decision support performance

Author: A Kaklauskas
A Kusiak
AC Marquez
AHB Duffy
Alex H. B. Duffy
B Chae
B Lopez
C Carlsson
C Silva
CD Evans
D Lam
D Mladenic
D Riecken
D Thapa
D Zhang
DA Guerra-Zubiaga
DR Dolk
DR Dolk
DS Linthicum
E Claver
E Thomsen
EJM Lauria
F Kebair
FD Turck
G DeSanctis
G Niu
GD Bhatt
GM Carter
H Lan
HA Simon
HA Simon
HA Simon
HY Lin
I Bose
I Boyle
I Thomas
I Truck
Iain M. Boyle
IH Witten
IK Bindoff
J Kolodner
J Zeleznikow
JE Nelson
JF Courtney
JH Lee
JH Lee
JJ Elam
JO Grady
JP Costa
JP Shim
K Eisenhardt
K Kristensen
K Pal
KQ Byung
KW Lee
L Ding
L Ekenberg
L Ekenberg
L Lin
LA Kurgan
M Alvarado
M Beynon
M Bradford
M Cohen
M Frize
M Harrison
M Limayem
M Wang
MJ Huang
MJ Shaw
ML Markus
MN Huhns
N Bolloju
NR Jennings
O Kwon
P Keen
P Keen
PA Rodgers
PC Nutt
QF Ni
R Anderson
R Anson
R Bellazzi
R Chalmeta
R Denzer
R Kimball
R Orwig
R Vahidov
RE Giachetti
RH Rao
Robert Ian Whitfield
RP Baker
RW Blanning
S Daskalaki
S Liu
S Liu
S Liu
S Szykman
SA Raghavan
SB Eom
SD Pinson
Shaofeng Liu
T Bui
TH Davenport
TJ Hess
TP Gerrity
WA Muhanna
WD Li
WD Li
Y Reich
Y Zhu
YC Tsai
Z Shi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2010
Field of study

Decision support system (DSS) is a well-established research and development area. Traditional isolated, stand-alone DSS has been recently facing new challenges. In order to improve the performance of DSS to meet the challenges, research has been actively carried out to develop integrated decision support systems (IDSS). This paper reviews the current research efforts with regard to the development of IDSS. The focus of the paper is on the integration aspect for IDSS through multiple perspectives, and the technologies that support this integration. More than 100 papers and software systems are discussed. Current research efforts and the development status of IDSS are explained, compared and classified. In addition, future trends and challenges in integration are outlined. The paper concludes that by addressing integration, better support will be provided to decision makers, with the expectation of both better decisions and improved decision making processes

Crossref

University of Strathclyde Institutional Repository

iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization

Author: Akutsu T.
Chen Y.-Z.
Chen Z.
Daly R.J.
Kurgan L.
Li C.
Li F.
Song J.
Webb G.I.
Xiang D.
Zhao P.
Zhao Q.
Publication venue: Oxford University Press
Publication date: 01/01/2021
Field of study

Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.Zhen Chen, Pei Zhao, Chen Li, Fuyi Li, Dongxu Xiang, Yong-Zi Chen, Tatsuya Akutsu, Roger J. Daly, Geoffrey I. Webb, Quanzhi Zhao, Lukasz Kurgan, and Jiangning Son

Adelaide Research & Scholarship

PubMed Central

Monash University Research Portal

University of Melbourne Institutional Repository

iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets

Author: Akutsu T.
Bain C.
Chen Z.
Gao X.
Gasser R.B.
Kurgan L.
Li C.
Li F.
Li J.
Liu X.
Song J.
Wang Y.
Yang Z.
Zhao P.
Publication venue: Oxford University Press
Publication date: 01/01/2022
Field of study

The rapid accumulation of molecular data motivates development of innovative approaches to computationally characterize sequences, structures and functions of biological and chemical molecules in an efficient, accessible and accurate manner. Notwithstanding several computational tools that characterize protein or nucleic acids data, there are no one-stop computational toolkits that comprehensively characterize a wide range of biomolecules. We address this vital need by developing a holistic platform that generates features from sequence and structural data for a diverse collection of molecule types. Our freely available and easy-to-use iFeatureOmega platform generates, analyzes and visualizes 189 representations for biological sequences, structures and ligands. To the best of our knowledge, iFeatureOmega provides the largest scope when directly compared to the current solutions, in terms of the number of feature extraction and analysis approaches and coverage of different molecules. We release three versions of iFeatureOmega including a webserver, command line interface and graphical interface to satisfy needs of experienced bioinformaticians and less computer-savvy biologists and biochemists. With the assistance of iFeatureOmega, users can encode their molecular data into representations that facilitate construction of predictive models and analytical studies. We highlight benefits of iFeatureOmega based on three research applications, demonstrating how it can be used to accelerate and streamline research in bioinformatics, computational biology, and cheminformatics areas. The iFeatureOmega webserver is freely available at http://ifeatureomega.erc.monash.edu and the standalone versions can be downloaded from https://github.com/Superzchen/iFeatureOmega-GUI/ and https://github.com/Superzchen/iFeatureOmega-CLI/.Zhen Chen, Xuhan Liu, Pei Zhao, Chen Li, Yanan Wang, Fuyi Li, Tatsuya Akutsu, Chris Bain, Robin B. Gasser, Junzhou Li, Zuoren Yang, Xin Gao, Lukasz Kurgan, and Jiangning Son

Adelaide Research & Scholarship

PubMed Central

Automatic structure classification of small proteins using random forest

Author: A Andreeva
A Andreeva
AG Murzin
AV Levitin
C Hadley
CHQ Ding
E Ie
G Zhanga
H Shen
HM Berman
I Chung
I Melvin
IH Witten
J Cheng
J Wu
JE Gewehr
JF Gibrat
Jonathan D Hirst
JR Quinlan
K Chen
KC Chou
L Breiman
L Holm
L Kurgan
M Gerstein
MB Swindells
MTA Shamim
O Çamoğlu
P Baldi
P Han
P Jain
P Klein
Pooja Jain
S Kim
S Mile
S Vinga
SE Brenner
SE Hamby
SF Altschul
SP Kanaan
SS Krishna
U Hobohm
V Sam
W Kabsch
X Chen
X Chen
XM Zhao
Y Cai
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP <it>Class, Fold, Super-family </it>or <it>Family </it>levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true <it>Class, Fold, Super-family </it>or <it>Family </it>levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Nearest neighbor imputation algorithms: a critical evaluation

Author: Alessandro Santaniello
DB Rubin
J Barnard
J Hardt
JL Schafer
L Breiman
L Breiman
LA Kurgan
Lorenzo Beretta
MR Šikonja
NB Karayiannis
O Troyanskaya
P Hall
PD Allison
RJA Little
RR Andridge
S Greenland
S Yenduri
SG Liao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref